AITopics | label and caption

Variational Autoencoder for Deep Learning of Images, Labels and Captions

Neural Information Processing SystemsNov-21-2025, 15:33:24 GMT

A novel variational autoencoder is developed to model images, as well as associated labels or captions. The Deep Generative Deconvolutional Network (DGDN) is used as a decoder of the latent image features, and a deep Convolutional Neural Network (CNN) is used as an image encoder; the CNN is used to approximate a distribution for the latent DGDN features/code. The latent code is also linked to generative models for labels (Bayesian support vector machine) or captions (recurrent neural network). When predicting a label/caption for a new image at test, averaging is performed across the distribution of latent codes; this is computationally efficient as a consequence of the learned CNN-based encoder. Since the framework is capable of modeling the image in the presence/absence of associated labels/captions, a new semi-supervised setting is manifested for CNN learning with images; the framework even allows unsupervised CNN learning, based on images alone.

deep learning, name change, variational autoencoder, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.78)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.61)

Add feedback

Reviews: Variational Autoencoder for Deep Learning of Images, Labels and Captions

Neural Information Processing SystemsJan-20-2025, 22:11:49 GMT

This paper presents a model and method which are likely to be of interest to many in the community. The formulation allows stochastic layers of a convolution-deconvolution structured autoencoder with stochastic layers to be parameterized and manipulated in such a way that inference can be performed in a much more computationally efficient way compared to Gibbs sampling and MCEM techniques. Results on CIFAR are fairly far from state of the art, but illustrate the key contributions related to efficient inference well. Results for a few other methods would be useful to include in Table 1 to give the reader some context as to what regime this approach and the examined model are operating within. The Flickr8k, 30k and MS COCO results seem quite strong, especially given that this work does not focus on the application, but rather the proposed VAE method.

deep learning, label and caption, variational autoencoder, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Variational Autoencoder for Deep Learning of Images, Labels and Captions

Pu, Yunchen, Gan, Zhe, Henao, Ricardo, Yuan, Xin, Li, Chunyuan, Stevens, Andrew, Carin, Lawrence

Neural Information Processing SystemsFeb-14-2020, 11:13:04 GMT

A novel variational autoencoder is developed to model images, as well as associated labels or captions. The Deep Generative Deconvolutional Network (DGDN) is used as a decoder of the latent image features, and a deep Convolutional Neural Network (CNN) is used as an image encoder; the CNN is used to approximate a distribution for the latent DGDN features/code. The latent code is also linked to generative models for labels (Bayesian support vector machine) or captions (recurrent neural network). When predicting a label/caption for a new image at test, averaging is performed across the distribution of latent codes; this is computationally efficient as a consequence of the learned CNN-based encoder. Since the framework is capable of modeling the image in the presence/absence of associated labels/captions, a new semi-supervised setting is manifested for CNN learning with images; the framework even allows unsupervised CNN learning, based on images alone.

deep learning, label and caption, variational autoencoder, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Variational Autoencoder for Deep Learning of Images, Labels and Captions

Pu, Yunchen, Gan, Zhe, Henao, Ricardo, Yuan, Xin, Li, Chunyuan, Stevens, Andrew, Carin, Lawrence

arXiv.org Machine LearningSep-28-2016

A novel variational autoencoder is developed to model images, as well as associated labels or captions. The Deep Generative Deconvolutional Network (DGDN) is used as a decoder of the latent image features, and a deep Convolutional Neural Network (CNN) is used as an image encoder; the CNN is used to approximate a distribution for the latent DGDN features/code. The latent code is also linked to generative models for labels (Bayesian support vector machine) or captions (recurrent neural network). When predicting a label/caption for a new image at test, averaging is performed across the distribution of latent codes; this is computationally efficient as a consequence of the learned CNN-based encoder. Since the framework is capable of modeling the image in the presence/absence of associated labels/captions, a new semi-supervised setting is manifested for CNN learning with images; the framework even allows unsupervised CNN learning, based on images alone.

deep learning, machine learning, variational autoencoder, (2 more...)

arXiv.org Machine Learning

1609.08976

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Collaborating Authors

label and caption

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Variational Autoencoder for Deep Learning of Images, Labels and Captions

Reviews: Variational Autoencoder for Deep Learning of Images, Labels and Captions

Variational Autoencoder for Deep Learning of Images, Labels and Captions

Variational Autoencoder for Deep Learning of Images, Labels and Captions